Reinforcement learning under circumstances beyond its control

نویسنده

Chris Gaskett

چکیده

Decision theory addresses the task of choosing an action; it provides robust decision-making criteria that support decision-making under conditions of uncertainty or risk. Decision theory has been applied to produce reinforcement learning algorithms that manage uncertainty in state-transitions. However, performance when there is uncertainty regarding the selection of future actions must also be considered, since reinforcement learning tasks are multiple-step decision problems. This work proposes β-pessimistic Q-learning—a reinforcement learning algorithm that does not assume complete control.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning to control dynamic systems via associative reinforcement learning

and the lack of explicit instructional information about how to perform a given control task. Under these circumstances, techniques developed by arti cial intelligence researchers for \learning from examples," including the \supervised learning" techniques studied by neural network researchers, are not directly applicable because these techniques are based on the availability of training inform...

متن کامل

Reinforcement Learning Methods for Continuous-Time Markov Decision Problems

Semi-Markov Decision Problems are continuous time generalizations of discrete time Markov Decision Problems. A number of reinforcement learning algorithms have been developed recently for the solution of Markov Decision Problems, based on the ideas of asynchronous dynamic programming and stochastic approximation. Among these are TD(,x), Q-Iearning, and Real-time Dynamic Programming. After revie...

متن کامل

Learning Strategies for Mid-Level Robot Control: Some Preliminary Considerations and Experiments

Versatile robots will need to be programmed, of course. But beyond explicit programming by a programmer, they will need to be able to plan how to perform new tasks and how to perform old tasks under new circumstances. They will also need to be able to learn. In this article, I concentrate on two types of learning, namely supervised learning and reinforcement learning of robot control programs. ...

متن کامل

The curse of planning: dissecting multiple reinforcement-learning systems by taxing the central executive.

A number of accounts of human and animal behavior posit the operation of parallel and competing valuation systems in the control of choice behavior. In these accounts, a flexible but computationally expensive model-based reinforcement-learning system has been contrasted with a less flexible but more efficient model-free reinforcement-learning system. The factors governing which system controls ...

متن کامل

Reinforcement Learning by Comparing Immediate Reward

This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate rewards using a variation of Q-Learning algorithm. Unlike the conventional Q-Learning, the proposed algorithm compares current reward with immediate reward of past move and work accordingly. Relative reward based Q-learning is an approach towards interactive learning. Q-Learning is a model free re...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

Reinforcement learning under circumstances beyond its control

نویسنده

چکیده

منابع مشابه

Learning to control dynamic systems via associative reinforcement learning

Reinforcement Learning Methods for Continuous-Time Markov Decision Problems

Learning Strategies for Mid-Level Robot Control: Some Preliminary Considerations and Experiments

The curse of planning: dissecting multiple reinforcement-learning systems by taxing the central executive.

Reinforcement Learning by Comparing Immediate Reward

عنوان ژورنال:

اشتراک گذاری